Rejection Threshold Estimation for an Unknown Language Model in an OCR Task
Identifieur interne : 000684 ( Main/Exploration ); précédent : 000683; suivant : 000685Rejection Threshold Estimation for an Unknown Language Model in an OCR Task
Auteurs : Joaquim Arlandis [Espagne] ; Juan-Carlos Perez-Cortes [Espagne] ; Ramon Navarro-Cerdan [Espagne] ; Rafael Llobet [Espagne]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2010.
Abstract
Abstract: In an OCR post-processing task, a language model is used to find the best transformation of the OCR hypothesis into a string compatible with the language. The cost of this transformation is used as a confidence value to reject the strings that are less likely to be correct, and the error rate of the accepted strings should be strictly controlled by the user. In this work, the expected error rate distribution of an unknown language model is estimated from a training set composed of known language models. This means that after building a new language model, the user should be able to automatically “fix” the expected error rate at an acceptable level instead of having to deal with an arbitrary threshold.
Url:
DOI: 10.1007/978-3-642-14980-1_73
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000108
- to stream Istex, to step Curation: 000106
- to stream Istex, to step Checkpoint: 000264
- to stream Main, to step Merge: 000689
- to stream Main, to step Curation: 000684
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Rejection Threshold Estimation for an Unknown Language Model in an OCR Task</title>
<author><name sortKey="Arlandis, Joaquim" sort="Arlandis, Joaquim" uniqKey="Arlandis J" first="Joaquim" last="Arlandis">Joaquim Arlandis</name>
</author>
<author><name sortKey="Perez Cortes, Juan Carlos" sort="Perez Cortes, Juan Carlos" uniqKey="Perez Cortes J" first="Juan-Carlos" last="Perez-Cortes">Juan-Carlos Perez-Cortes</name>
</author>
<author><name sortKey="Navarro Cerdan, Ramon" sort="Navarro Cerdan, Ramon" uniqKey="Navarro Cerdan R" first="Ramon" last="Navarro-Cerdan">Ramon Navarro-Cerdan</name>
</author>
<author><name sortKey="Llobet, Rafael" sort="Llobet, Rafael" uniqKey="Llobet R" first="Rafael" last="Llobet">Rafael Llobet</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:9783140E698735B202412CDF7971320FDA579561</idno>
<date when="2010" year="2010">2010</date>
<idno type="doi">10.1007/978-3-642-14980-1_73</idno>
<idno type="url">https://api.istex.fr/document/9783140E698735B202412CDF7971320FDA579561/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000108</idno>
<idno type="wicri:Area/Istex/Curation">000106</idno>
<idno type="wicri:Area/Istex/Checkpoint">000264</idno>
<idno type="wicri:doubleKey">0302-9743:2010:Arlandis J:rejection:threshold:estimation</idno>
<idno type="wicri:Area/Main/Merge">000689</idno>
<idno type="wicri:Area/Main/Curation">000684</idno>
<idno type="wicri:Area/Main/Exploration">000684</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Rejection Threshold Estimation for an Unknown Language Model in an OCR Task</title>
<author><name sortKey="Arlandis, Joaquim" sort="Arlandis, Joaquim" uniqKey="Arlandis J" first="Joaquim" last="Arlandis">Joaquim Arlandis</name>
<affiliation wicri:level="1"><country xml:lang="fr">Espagne</country>
<wicri:regionArea>Instituto Tecnológico de Informática, Universitat Politècnica de València, Camí de Vera s/n, 46071, València</wicri:regionArea>
<wicri:noRegion>València</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Espagne</country>
</affiliation>
</author>
<author><name sortKey="Perez Cortes, Juan Carlos" sort="Perez Cortes, Juan Carlos" uniqKey="Perez Cortes J" first="Juan-Carlos" last="Perez-Cortes">Juan-Carlos Perez-Cortes</name>
<affiliation wicri:level="1"><country xml:lang="fr">Espagne</country>
<wicri:regionArea>Instituto Tecnológico de Informática, Universitat Politècnica de València, Camí de Vera s/n, 46071, València</wicri:regionArea>
<wicri:noRegion>València</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Espagne</country>
</affiliation>
</author>
<author><name sortKey="Navarro Cerdan, Ramon" sort="Navarro Cerdan, Ramon" uniqKey="Navarro Cerdan R" first="Ramon" last="Navarro-Cerdan">Ramon Navarro-Cerdan</name>
<affiliation wicri:level="1"><country xml:lang="fr">Espagne</country>
<wicri:regionArea>Instituto Tecnológico de Informática, Universitat Politècnica de València, Camí de Vera s/n, 46071, València</wicri:regionArea>
<wicri:noRegion>València</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Espagne</country>
</affiliation>
</author>
<author><name sortKey="Llobet, Rafael" sort="Llobet, Rafael" uniqKey="Llobet R" first="Rafael" last="Llobet">Rafael Llobet</name>
<affiliation wicri:level="1"><country xml:lang="fr">Espagne</country>
<wicri:regionArea>Instituto Tecnológico de Informática, Universitat Politècnica de València, Camí de Vera s/n, 46071, València</wicri:regionArea>
<wicri:noRegion>València</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Espagne</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2010</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">9783140E698735B202412CDF7971320FDA579561</idno>
<idno type="DOI">10.1007/978-3-642-14980-1_73</idno>
<idno type="ChapterID">73</idno>
<idno type="ChapterID">Chap73</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: In an OCR post-processing task, a language model is used to find the best transformation of the OCR hypothesis into a string compatible with the language. The cost of this transformation is used as a confidence value to reject the strings that are less likely to be correct, and the error rate of the accepted strings should be strictly controlled by the user. In this work, the expected error rate distribution of an unknown language model is estimated from a training set composed of known language models. This means that after building a new language model, the user should be able to automatically “fix” the expected error rate at an acceptable level instead of having to deal with an arbitrary threshold.</div>
</front>
</TEI>
<affiliations><list><country><li>Espagne</li>
</country>
</list>
<tree><country name="Espagne"><noRegion><name sortKey="Arlandis, Joaquim" sort="Arlandis, Joaquim" uniqKey="Arlandis J" first="Joaquim" last="Arlandis">Joaquim Arlandis</name>
</noRegion>
<name sortKey="Arlandis, Joaquim" sort="Arlandis, Joaquim" uniqKey="Arlandis J" first="Joaquim" last="Arlandis">Joaquim Arlandis</name>
<name sortKey="Llobet, Rafael" sort="Llobet, Rafael" uniqKey="Llobet R" first="Rafael" last="Llobet">Rafael Llobet</name>
<name sortKey="Llobet, Rafael" sort="Llobet, Rafael" uniqKey="Llobet R" first="Rafael" last="Llobet">Rafael Llobet</name>
<name sortKey="Navarro Cerdan, Ramon" sort="Navarro Cerdan, Ramon" uniqKey="Navarro Cerdan R" first="Ramon" last="Navarro-Cerdan">Ramon Navarro-Cerdan</name>
<name sortKey="Navarro Cerdan, Ramon" sort="Navarro Cerdan, Ramon" uniqKey="Navarro Cerdan R" first="Ramon" last="Navarro-Cerdan">Ramon Navarro-Cerdan</name>
<name sortKey="Perez Cortes, Juan Carlos" sort="Perez Cortes, Juan Carlos" uniqKey="Perez Cortes J" first="Juan-Carlos" last="Perez-Cortes">Juan-Carlos Perez-Cortes</name>
<name sortKey="Perez Cortes, Juan Carlos" sort="Perez Cortes, Juan Carlos" uniqKey="Perez Cortes J" first="Juan-Carlos" last="Perez-Cortes">Juan-Carlos Perez-Cortes</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000684 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000684 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:9783140E698735B202412CDF7971320FDA579561 |texte= Rejection Threshold Estimation for an Unknown Language Model in an OCR Task }}
This area was generated with Dilib version V0.6.32. |